

CLAIMSWHAT IS CLAIMED:

1. A method, comprising:

5 detecting an error in data stored in a storage device in a system;

determining if the detected error is correctable; and

making at least a portion of the storage device unavailable to one or more  
10 resources in the system in response to determining that the error is  
uncorrectable.

2. The method of claim 1, wherein detecting the error comprises detecting the  
error in the data using error correction code.

15 3. The method of claim 2, wherein determining if the detected error is  
correctable comprises determining that the detected error is a multi-bit error.

4. The method of claim 1, wherein determining if the detected error is  
correctable comprises determining that the detected error is an address parity error.

20 5. The method of claim 1, wherein making at least the portion of the storage  
device unavailable comprises making at least the portion of the storage device unavailable  
while the system is in operation.

6. The method of claim 1, further comprising testing the storage device based on determining that the error is uncorrectable.

7. The method of claim 6, further comprising servicing the storage device in response to testing the storage device.

5  
8. The method of claim 7, further comprising dynamically allowing access to the storage unit in response to servicing the storage device.

10  
9. The method of claim 1, wherein the storage device includes a directory cache, and wherein making at least the portion of the storage device unavailable comprises generating a cache miss in response to a request to access the directory cache.

15  
10. An apparatus, comprising:  
a directory cache adapted to store at least one entry; and  
a control unit is adapted to:  
determine if at least one uncorrectable error exists in the directory cache; and  
place the directory cache offline in response to determining that the error is uncorrectable.

20  
11. The apparatus of claim 10, wherein the directory cache is a three-way associative directory cache.

12. The apparatus of claim 10, wherein the control unit determines if the entry contains a multi-bit error.

5 13. The apparatus of claim 12, wherein entry is an address bit entry, and wherein the control unit determines if the address parity bit entry contains an error.

10 14. The apparatus of claim 10, wherein the directory cache is associated with a domain, and wherein the control unit places the directory cache offline while the domain is active.

15 15. The apparatus of claim 14, wherein the control unit provides a cache miss to a device requesting to access the directory cache while the directory cache is offline.

16. The apparatus of claim 14, wherein the control unit tests the directory cache in response to determining that the error is uncorrectable.

17. The apparatus of claim 15, wherein the control unit causes the directory cache to be serviced in response to testing the directory cache.

20 18. The apparatus of claim 15, wherein the control unit places the directory cache on-line in response to causing the directory cache to be serviced.

19. The apparatus of claim 18, wherein the control unit places the directory cache online dynamically.

20. An article comprising one or more machine-readable storage media containing instructions that when executed enable a processor to:

determine a multiple-bit error in data stored in a storage device of a domain;

5  
and

isolate at least a portion of the storage device from one or more resources in the domain while the domain is active, in response to determining the multiple-bit error.

10  
21. The article of claim 20, wherein the instructions when executed enable the processor to perform an ECC error check to determine the multiple-bit error in the data.

15  
22. The article of claim 20, wherein the instructions when executed enable the processor to dynamically test the storage device in response to isolating the storage device.

23. The article of claim 20, wherein the instructions when executed enable the processor to dynamically restore the storage device in the domain.

20  
24. The article of claim 20, wherein the instructions when executed enable the processor to provide a cause of the multiple-bit error.